Improving reading and recreating fit pseudodata #1328

siranipour · 2021-07-14T12:30:06Z

Now that python is used to construct the pseudodata, I've updated the pseudodata.py function that regenerates the pseudodata of a given fit. It's now much easier to do this in parallel because we don't use swig objects anymore.

Closes #1323

The tests will, however, fail for now until I regenerate a PDF that saves its python pseudodata for the regression test. @wilsonmr, I may be misremembering but did you at some point have a PR which saved the pseudodata of a fit?

EDIT: This has turned into a bit of a general improvements PR.

Corrected the recreation of a fit pseudodata. I've swapped the C++ make_replica with the python one.
Improved the reading of pseudodata from a fit which has fitting::savepseudodata: True
There was a function that created the tr_mask for all replicas:

nnpdf/validphys2/src/validphys/n3fit_data.py

Line 386 in b636ae0

def training_mask_table(replicas_exps_tr_masks, replicas, experiments_index):

I've now changed this so it works on an individual replica and then collected it over replicas (I did this because it was useful for the recreation of a training validation mask.
Fixed the tests

wilsonmr · 2021-07-14T12:39:26Z

now that the replica generation is fast, can we not just call the action on a set of replicas? Is there really that much speed up from MP now? In fact since the action is in validphys, can we not just call --parallel in the usual way to get the MP?

wilsonmr · 2021-07-14T12:44:21Z

@wilsonmr, I may be misremembering but did you at some point have a PR which saved the pseudodata of a fit?

This got added in #1081 I think with the savepseudodata flag:

nnpdf/n3fit/src/n3fit/scripts/n3fit_exec.py

Line 131 in 74d9183

if file_content["fitting"].get("savepseudodata"):

wilsonmr · 2021-07-14T12:45:39Z

in particular, calling the API inside of a validphys action seems really horrible, I don't think this should be an internal action, or even exist at all

siranipour · 2021-07-14T13:05:47Z

Oh true actually, let me see if the --parallel flag works

wilsonmr · 2021-07-14T14:44:56Z

just to say, the function which I mentioned, pseudodata_table, was actually not updated to use make_replica in #1268 which is why the function is so painfully slow, it also means that function is currently useless because it's not even dumping the right pseudodata during the fit haha

wilsonmr · 2021-07-14T14:58:17Z

how long does this take now with the collect approach @siranipour ?

siranipour · 2021-07-15T10:07:01Z

just to say, the function which I mentioned, pseudodata_table, was actually not updated to use make_replica in #1268 which is why the function is so painfully slow, it also means that function is currently useless because it's not even dumping the right pseudodata during the fit haha

Ah ok haha, I'll try and get another PR for that then because the tests will fail until we regenerate a new regression fit.

how long does this take now with the collect approach @siranipour ?

Unfortunately I was a bit pushed for time, so I didn't get to fully benchmark it, but it still took a non negligible O(few minutes). I'll take a proper look and also play with the parallel flag and see what happens.

github-actions · 2021-07-21T19:02:00Z

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Fit Name: NNBOT-aab94f0a6-2021-07-21
Fit Report: https://vp.nnpdf.science/ENmhRHSUSfqmR6gt4zmOOw==
Fit Data: https://data.nnpdf.science/fits/NNBOT-aab94f0a6-2021-07-21.tar.gz

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

siranipour · 2021-07-21T19:20:32Z

Any ideas why the regression tests fail? They pass locally for me...

siranipour · 2021-07-27T12:03:25Z

Ahh I generated the new baseline with theory 200 which is why the tests are failing. Will fix

siranipour · 2021-09-03T11:40:48Z

@scarlehoff this PR flies in tandem with #1333, in that it reproduces the pseudodata a posteriori which is in principle the ones that are saved by #1333. I would appreciate if you could take a look at this soon too.

The basic idea is that I've added actions that work on a per replica basis. I.e replicate the pseudodata of replica_i. We then collect over either NSList(range(nrep), nskey='replica') to replicate all replicas, or produce_pdfreplicas, which is basically the same thing but accounting for the postfit reshuffling

validphys2/src/validphys/tests/test_pseudodata.py

validphys2/src/validphys/n3fit_data.py

validphys2/src/validphys/config.py

validphys2/src/validphys/pseudodata.py

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

siranipour marked this pull request as draft July 14, 2021 12:30

siranipour force-pushed the replicate_pseudodata branch from d2e5304 to 9d9ae85 Compare July 14, 2021 12:37

siranipour force-pushed the replicate_pseudodata branch from 70ae840 to ccde1b2 Compare July 14, 2021 14:31

This was referenced Jul 19, 2021

Saving new python pseudodata during fit #1333

Merged

Improved reading of fit pseudodata #1334

Merged

siranipour force-pushed the replicate_pseudodata branch from ba389c9 to 62c4521 Compare July 21, 2021 09:47

siranipour changed the title ~~Recreating python psueododata~~ Recreating a fit's python psueododata Jul 21, 2021

siranipour requested review from scarlehoff and Zaharid July 21, 2021 16:07

siranipour changed the title ~~Recreating a fit's python psueododata~~ Improving reading and recreating fit pseudodata Jul 21, 2021

siranipour marked this pull request as ready for review July 21, 2021 16:12

siranipour added the run-fit-bot Starts fit bot from a PR. label Jul 21, 2021

siranipour force-pushed the replicate_pseudodata branch 4 times, most recently from 0df61c3 to 2c49d34 Compare July 27, 2021 09:14

siranipour force-pushed the replicate_pseudodata branch from 2c49d34 to 30280ea Compare July 27, 2021 13:08

siranipour added 3 commits August 31, 2021 14:46

Recreating python psueododata

bf189b7

Using collect approach

46e6bbf

Adding docstrings

a53adfd

siranipour added 12 commits August 31, 2021 14:46

Removing unused import

679384e

Changing logic of reading pseudodata

347b942

Removing os import

faf554d

Deleting old and now obsolete training validation function

8f146d0

Making recreates return same type as read

207d4db

Returning DataTrValSpec list

a4c601f

Collecting over replicas

2d8214d

Adding other seeds to fitenvironment

c39703d

Fixing tests

cad699a

Removing unused imports

5a89b3b

Fixing example in docstrings

24184b8

Fixing header of test file

747f7a5

siranipour force-pushed the replicate_pseudodata branch from 30280ea to 747f7a5 Compare August 31, 2021 13:47

scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Sep 1, 2021

Merge branch 'master' into replicate_pseudodata

1d9bffd

scarlehoff reviewed Sep 3, 2021

View reviewed changes

validphys2/src/validphys/tests/test_pseudodata.py Outdated Show resolved Hide resolved

Formatting test_pseudodata.py

bed5b93

scarlehoff reviewed Sep 3, 2021

View reviewed changes

validphys2/src/validphys/n3fit_data.py Show resolved Hide resolved

validphys2/src/validphys/config.py Show resolved Hide resolved

validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved

validphys2/src/validphys/pseudodata.py Outdated Show resolved Hide resolved

siranipour and others added 2 commits September 14, 2021 17:19

Fixing examples

978ab4e

Apply suggestions from code review

3b9bd07

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

scarlehoff approved these changes Sep 15, 2021

View reviewed changes

Logger warning that we overwrite seeds

3114c38

siranipour merged commit b7e1e14 into master Sep 15, 2021

siranipour deleted the replicate_pseudodata branch September 15, 2021 15:18

Zaharid mentioned this pull request Sep 23, 2021

Fitenvironment is broken for nnpdfcpp fits #1414

Closed

Zaharid added the enhancement New feature or request label Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving reading and recreating fit pseudodata #1328

Improving reading and recreating fit pseudodata #1328

siranipour commented Jul 14, 2021 •

edited

Loading

wilsonmr commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

siranipour commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

siranipour commented Jul 15, 2021

github-actions bot commented Jul 21, 2021

siranipour commented Jul 21, 2021

siranipour commented Jul 27, 2021

siranipour commented Sep 3, 2021

Improving reading and recreating fit pseudodata #1328

Improving reading and recreating fit pseudodata #1328

Conversation

siranipour commented Jul 14, 2021 • edited Loading

wilsonmr commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

siranipour commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

wilsonmr commented Jul 14, 2021

siranipour commented Jul 15, 2021

github-actions bot commented Jul 21, 2021

siranipour commented Jul 21, 2021

siranipour commented Jul 27, 2021

siranipour commented Sep 3, 2021

siranipour commented Jul 14, 2021 •

edited

Loading